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Abstract 

There has been significant recent progress in solving the long-standing prob¬ 
lems of how nuclear shell structure and collective motion emerge from under¬ 
lying microscopic inter-nucleon interactions. We review a selection of recent 
significant results within the ab initio No Core Shell Model (NCSM) closely 
tied to three major factors enabling this progress: (1) improved nuclear interac¬ 
tions that accurately describe the experimental two-nucleon and three-nucleon 
interaction data; (2) advances in algorithms to simulate the quantum many- 
body problem with strong interactions; and (3) continued rapid development of 
high-performance computers now capable of performing 20 x 10*® floating point 
operations per second. We also comment on prospects for further developments. 
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1 Introduction 

The ab initio No Core Shell Model (NCSM), using realistic microscopic nucleon- 
nucleon (NN) and three-nucleon forces (SiVFs), has proven to be a powerful combi¬ 
nation for describing and predicting properties of light nuclei mm- The Hamiltonian 
framework results in a large sparse matrix eigenvalue problem for which we seek the 
low-lying eigenvalues and eigenvectors to form comparisons with experimental data 
and to make testable predictions. Given the rapid advances in hardware with frequent 
disruptions in architecture, it has become essential for physicists, computer scientists 
and applied mathematicians to work in close collaboration in order to achieve efficient 
solutions to forefront physics problems. Fortunately, US funding agencies have rec¬ 
ognized these challenges at the interface of science and technology and have provided 
support leading to our recent successes [HHn]. 

We present here a selection of recent results for light nuclei and neutron drops in 
external traps and set out some of the challenges that lie ahead. The results include 
both those utilizing the JISP16 NN interaction and those using chiral effective field 
theory NN plus 3N interactions. We also present a selection of algorithms developed 
for high-performance computers that are helping to rapidly pave the way to efficient 
utilization of exascale machines (10^® floating point operations per second). We il¬ 
lustrate the scientific progress attained with multi-disciplinary teams of physicists, 
computer scientists and applied mathematicians. 

This paper is aimed to complement presentations at this meeting that cover 
closely-related topics. In this connection, it is important to point especially to the 
papers by Dytrych et al. [TH] , by Abe et al. [S], by Shirokov et al. [5D] and by Mazur 
et al. m- We therefore focus here on the following recent results: (1) demonstrating 
the emergence of collective rotations in light nuclei; (2) achieving an accurate descrip¬ 
tion of the properties of with chiral Hamiltonians; (3) solving for properties of 
neutron drops with chiral Hamiltonians; (4) development of techniques for efficient 
use of computational accelerators; and (5) development of techniques for overlapping 
communication and computation. 


2 Emergence of collective rotations 

NCSM calculations of various types have been used to demonstrate the emergence 
of collective rotational correlations in p-shell nuclei, including ®Li [THllH], the Be 
isotopes [ 22 H 25 ] . and ^^C [26]. Here we focus on the results for the Be isotopes solved 
in the No Core Full Configuration (NCFC) framework [4l6ll7] using the realistic JISP16 
NN interaction [23|28] with the M-scheme harmonic oscillator (HO) basis. The 
NCFC framework uses many of the same techniques as the NCSM but additionally 
features extrapolations of observables to the infinite matrix limit [I]. 

With no prior selection of our basis to favor solutions with collective motion and 
using only the realistic bare NN interaction (i. e. we omit the Coulomb interaction to 
ensure exact conservation of isospin thereby simplifying the spectrurrQ) we face the 
task of analyzing our microscopic results and determining which particular states, 
among the large number of calculated levels, exhibit signatures of collective nuclear 
motion. We follow the path of calculating observables and post-analyzing their sys- 
tematics to infer that they follow the patterns prescribed by collective rotation. This 
path is analogous to that taken when analyzing experimental data. When we dis¬ 
cover patterns appropriate to a collective band in our calculated results, we assign 
the moniker of “collective motion” to our microscopic results. We further compare 
the so-detected band with experimental results and find good agreement which fur- 

^The primary effect of the Coulomb is to shift the binding energies which would not affect our 
analysis of rotational band observables. New analysis including Coulomb m confirms this. 
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ther supports our discovery of emergent collective phenomena in light nuclei from the 
underlying microscopic many-body theory. 

The details of this step-by-step analysis may be found in the Refs. [53H1S]. We 
analyze the systematics of calculated excitation energies, quadrupole moments, dipole 
moments, electric quadrupole transition B(E2ys and their reduced matrix elements 
to isolate states which have a clear rotation band assignment from those which do 
not. In this way, we have identified both ground state and excited state bands, both 
natural and unnatural parity bands, and bands in even-even as well as in even-odd 
nuclei. 

Perhaps the most striking hallmark of collective rotation is the appearance of 
excited states with excitation energies that follow a simple pattern prescribed by the 
collective model. This pattern of collective rotational excitation energies is given in 
Eq. (H]): 

E{J) = Eo + A[J{J + 1) + + i)<5K,i/2], (1) 

where Eq is an offset to properly position excited band heads relative to the lowest 
band head, a is the Coriolis decoupling parameter for RT = ^ bands appearing in odd-^ 
nuclei, J is the total angular momentum and A = I?I{2J') with J representing the 
moment of inertia of the deformed nucleus. 

To be convinced that the states are indeed members of a rotational band one needs 
to find that these states also exhibit enhanced electromagnetic moments and transition 
rates that exhibit a dependence on angular momentum J that is also prescribed by the 
collective rotational model. We therefore adopt these additional criteria for assigning 
calculated states to rotational bands. It is worth noting here that, in light nuclei, 
gamma decay data are scarce due to the short-lived resonant nature of the states. 
Therefore, the calculations provide access to quantities that are typically inaccessible 
in experiment, yet crucial for confirming collectivity. 

We extract parameters of the traditional rotational description through fits to our 
theoretical results after extrapolation to the the infinite matrix limit (for extrapolation 
details see Ref. [^) and we compare these extracted parameters with rotational 
parameters determined from similar fits to the corresponding experimental data. The 
energy parameters for bands across the Be isotopic chain are summarized in Fig.[TJ the 
band excitation energy E^ (defined relative to the yrast band as E^ = Eq — Eg^yrast), 
the band rotational parameter or slope A, and the band Coriolis decoupling parameter 
or staggering a (for K = 1/2). 

In total, we compare 23 theoretical and experimental collective rotation param¬ 
eters for energies in the 6 Be isotopes depicted in Fig. [T] Overall the agreement 
between theory and experiment is remarkable. Additional analyses of the calculated 
electromagnetic observables in Refs. [23U2^ and comparison with sparse data avail¬ 
able confirm that we have observed the emergent phenomena of collective rotation in 
these ab initio calculations for the Be isotopes. At the same time, there are oppor¬ 
tunities for additional theoretical and experimental research to explore, for example, 
where rotational bands terminate and whether additional bands may be found in 
these and other light nuclei. It appears that bands do not always terminate at the 
state corresponding to the maximum angular momentum supported by the nucleons 
occupying the standard valence shell model orbitals 


3 Chiral Hamiltonian description of 

Recent significant theoretical advances for the underlying Hamiltonians, constructed 
within chiral effective field theory (EFT), provide a foundation for nuclear many-body 
calculations rooted in QCD [301131) . These developments motivate us to adopt a chiral 
EFT Hamiltonian here and in the following section on neutron drops in an external 
trap. We also adopt the similarity renormalization group (SRG) approach [3M37] 
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Figure 1: Rotational parameters A, a and [defined relative to the yrast band 
as Ex = Eq — i?o, yrast — See Eq. ([T|)] for ground and excited bands of the Be isotopes 
(adapted from Ref [H]). Brackets highlight the difference between the parameters de¬ 
termined from experimental data (horizontal bars) and those extracted from NCFC 
calculations with extrapolation (parallel triangles) to the infinite matrix limit. Solid 
symbols connected by solid lines indicate the finite matrix results as a function of 
increasing A^max with larger symbols for larger A^max values. A^max is defined as 
the maximum number of oscillator quanta in the HO configurations above the mini¬ 
mum for the nucleus under investigation. The minimum A^max is 0 for natural parity 
and 1 for unnatural parity. The results indicated in the solid symbols correspond 
to 6 < A^max < 10 for natural parity and 7 < A^max <11 for unnatural parity. 


that allows us to consistently evolve (soften) the Hamiltonian and other operators, 
including 3N interactions [3M40] . 

We select the example of the spectroscopy of to illustrate the recent progress. 
In so doing, it is important to note that additional progress in achieving larger basis 
spaces is needed before we can realistically address cluster model states in light nuclei 
such as the celebrated “Hoyle state”, a 0< state at 7.654 MeV of excitation energy 
in 12C. 

The theoretical excitation spectra are presented in Fig.[5]for the two highest A^max 
values currently achievable and are compared with experiment. For the negative par¬ 
ity states, we elect to show excitation energies relative to the lowest state of that 
parity whose experimental energy is 9.641 MeV above the ground state. The trends 
with increasing Vmax (see the trends for additional observables in Ref. m) suggest 
convergence is sufficient to draw important conclusions regarding the underlying inter¬ 
action. In particular, we note that the shifts from including the initial 3V interaction 
are substantial. In most cases, these shifts improve agreement between theory and 
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Figure 2: Theoretical and experimental excitation spectra of for both positive 
parity (top panel) and negative parity (bottom panel) states for two different values of 
Nmax at fifl = 20 MeV (adapted from Ref. [41]). The columns labelled “chiral NN” 
include the 3A^F induced by SRG while the sub panels labelled “chiral NN + 3N” 
include the initial iViV + 3A^F evolved by SRG together with NN. The SRG evolution 
parameter is A = 2.0 fm“^. See Ref. [41] for additional details. 

experiment. A notable exception is the J'^ = 1+, T = 0 positive parity state which 
shifts further from experiment when we include the initial 3N interaction. 

From our results in we conclude that we need further improvements in the 
chiral interactions. For example, we need to have NN and 3N interactions at the same 
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chiral order to be consistent. We also need to extend the chiral order of the interactions 
to N4LO and, possibly, to include the derived four-nucleon (4A^) interactions. 


4 Confined neutron drops with chiral Hamiltonians 

There are many motivations for considering artificial pure neutron systems confined 
by an external trap. 

• Gain insights into the properties of systems dominated by multi-neutron degrees 
of freedom such as unstable neutron-rich nuclei and neutron stars. 

• Isolate selected isospin components of the NN (T = 1) and 3iV {T = 3/2) 
interactions for detailed study. 

• Inform the development of nuclear energy density functionals that may be tuned 
to reproduce ab initio calculations, complementing their tuning to experimental 
data. 

The external trap is required since realistic interactions do not bind pure neutron 
systems, though they do produce net attraction when the systems are conhned. The 
main foci are to observe differences among realistic interactions and to see if subshell 
closures are predicted. For example, one may investigate spin-orbit splitting as a 
function of the chosen interaction and as a function of the external field parameters. 

Using the same realistic chiral NN + 3N interactions as used in the previous sec¬ 
tion, we investigated [I^fTB] neutron drop systems in a 10 MeV HO trap. In Ref. [16] 
we compared the results with those from Green’s Function Monte Carlo (GFMC) and 
auxiliary field diffusion Monte Carlo (AFDMC) |42l|43] using the Argonne v'^ (AV8’) 
NN interaction [33] and the Urbana IX (UIX) 3N interaction. We also compared 
with GFMC and AFDMC results using AV8’ with the lllinois-7 (1L7) 3N interac¬ 
tion [44ll45] . 

For the investigations in Ref. m we employed both NCFC and coupled cluster 
(CC) methods. By implementing CC, we were able to obtain results for larger neutron 
drop systems. 

We found important dependences on the selected interactions as shown in Fig. [3| 
which should have an impact on phenomenological energy-density functionals that 
may be derived from them. Note in Fig. [3| that, with increasing N, the chiral predic¬ 
tions lie between results from different high-precision phenomenological interactions, 
i. e. between AV8'-|-U1X and AV8'-|-1L7. It will be very important to see the influences 
the results of these different interactions have on energy density functionals. 

One also notices in Fig. [3| there are surprisingly weak contributions from the 
inclusion of the chiral SN interaction. Based on systematic trends shown in previous 
neutron-drop investigations |42l43ll46] , with non-chiral interactions we anticipate these 
conclusions will persist over a range of HO well strengths. Additional investigations 
are in progress to confirm this hypothesis and to extend the results to higher neutron 
numbers. 


5 Computational accelerators 

and decoupling transformations 

Fundamental physics investigations with chiral NN -|- 3N interactions require fore¬ 
front computational techniques in order to efficiently utilize leadership computational 
facilities. Many of our efforts are aimed to develop new algorithms that exploit the 
recent advances in hardware and software. Here we describe one of those projects 
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that could only have been accomplished through our multidisciplinary team working 
in close collaboration. 

This specific project focused on adapting our NCSM code, Many-Fermion Dynam¬ 
ics — nuclear (MFDn), for use with GPU accelerators on the supercomputer Titan 
at Oak Ridge National Lab. MFDn represents the input NN and 3N interactions in 
the “coupled-JT” basis with coupled angular momentum and isospin, exploiting rota¬ 
tional symmetry and isospin conservation to reduce memory requirements [261I381HD] . 
In one representative case, storing a 37V input interaction in the coupled-JT basis 
reduces the interaction file size from 33 GigaBytes (GB) to less than 0.5 GB. This 
method is crucial for pushing the boundaries of problem sizes that we can address, 
as the input interactions must be stored once per process; using the ideal process 
configuration on Titan, processes have access to 16 GB each. Such a reduction in 
memory usage, then, not only enables calculations with larger input interactions, 
which are required for larger model spaces, but also makes their memory footprints 
more manageable, leaving more room for the memory-limited NGSM calculation. 

As a side-effect of this compression, as we construct the full many-nucleon Hamilto¬ 
nian from the input TVTV and 3TV interactions, we must perform basis transformations 
to extract input interaction matrix elements that our code can use directly. These 
basis transformations are both computationally intensive and amenable to paralleliza¬ 
tion; they are a natural fit for Titan’s GPU accelerators. We have taken advantage of 
our multidisciplinary team of physicists, computer scientists, and applied mathemati- 



Figure 3: Gomparison of ground state energies of systems with TV neutrons trapped 
in a HO with strength 10 MeV. Solid red diamonds and blue dots signify results 
with TV TV -I- 3TV interactions derived from chiral effective field theory related to QGD. 
The inset displays the ratio of TV TV -|- 3TV to TV TV alone for the different interactions 
with the error indicated on the far right of each curve where it is maximum. The 
label indicates the many-body methods employed: (Importance-Truncated) No Gore 
Shell Model ((IT-)NCSM); Goupled Cluster including Triples (ACCSD(T)); Quantum 
Monte Carlo (QMC). Figure adapted from Ref. [TB]. 
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Figure 4: Speedup in the many-nucleon Hamiltonian construction stage due to im¬ 
plementation on GPU accelerators, graphed against the number of nonzero matrix 
elements in the Hamiltonian. There is no clear trend, but all speedups are in approx¬ 
imately the same region, indicating good weak scaling across this range of problem 
sizes. We graph matrix construction speedup here instead of overall speedup; overall 
speedup depends strongly on how long the matrix diagonalization takes, which is a 
function of the number of eigenstates required. Figure adapted from Ref. |15| . 


cians to port this section of our code to the GPU and optimize it [47]. Integrating the 
GPU-accelerated basis transformation into MFDn produces a speedup of 2.2x-2.7x in 
the many-nucleon Hamiltonian construction, as illustrated in Fig. [d] and a speedup 
of 1.2x-1.4x in the full calculation, with some variation depending on the particular 
problem chosen |15j . 


6 Overlapping communications and calculations 

Our configuration interaction (Cl) approach to the nuclear many-body problem re¬ 
sults in a large sparse matrix eigenvalue problem with a symmetric real Hamiltonian 
matrix. This presents major technical challenges and is widely recognized as “compu¬ 
tationally hard.” One of the popular methods for obtaining the low-lying eigenvalues 
and eigenvectors is the Lanczos algorithm that we have implemented in MFDn. As 
the problem size increases with either increasing basis spaces or with the inclusion 
of 3V interactions, we face the challenge of communication costs rising with the in¬ 
creased numbers of nodes used in the calculations. The increase in nodes is driven by 
memory requirements as mentioned in the previous section. 

In order to reduce communication costs, we developed an efficient mapping of 
the eigensolver onto the available hardware with a “topology-aware” mapping al¬ 
gorithm nain]. We also developed an improved Lanczos algorithm that overlaps 
communications with calculations [I411I3- 

For the challenge of efficiently overlapping communications with calculations, we 
worked with a hybrid MPI-OpenMP implementation and delegated one or a few 
threads to perform inter-process communication tasks, while the remaining threads 
carried out the multi-threaded computational tasks. In our algorithm, we also im¬ 
plemented a dynamical scheduling of the computations among the threads for the 
sparse matrix-vector multiplication (SpMV) so that, once a communication thread 
completes that task, it can participate in the multi-threaded computations. 
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Figure 5: Comparison of SpMV and communication methods for an iteration of the 
Lanczos algorithm carried out by the majority of the processing units, the ones that 
store the off-diagonal blocks of the Hamiltonian matrix. The left subfigure displays 
a traditional sequential process that may be implemented with MPI. The right sub- 
hgure presents our algorithm suitable for hybrid MPI-OpenMP. Yellow ovals depict 
communication and rectangles depict computation. The red rectangle indicates where 
we require thread synchronization which incurs a small additional cost. The figure is 
adopted from Refs. muz]. 


In Fig. Owe compare a straightforward SpMV implementation using sequential 
steps (left subhgure) with our algorithm (right subfigure). By mapping MPI pro¬ 
cesses in a balanced column-major order as well as developing and implementing our 
algorithm to overlap communications and calculations, we achieved over 80% parallel 
efhciency through reduction in communication overhead during the Lanczos iteration 
process. This includes both the SpMV and orthogonalization steps that occur in each 
iteration. We also found major improvements in the scalability of the eigensolver 
especially after adopting our topology-aware mapping algorithm. Since SpMV and 
vector-vector multiplication of these types are common to many other iterative meth¬ 
ods, we believe our achievements have a wide range of applicability. 


7 Future prospects 

Most of our applications have focused on light nuclei with atomic number H < 16 
where our theoretical many-body methods have achieved successes with leadership 
class facilities. However, the frontiers of our field include applications to heavier nuclei 
and utilizing new and improved interactions from chiral effective field theory. At the 
same time, we aim to evaluate observables with increasing sophistication using their 
operators also derived within chiral effective field theory. We mention the example of 
neutrinoless double beta decay as one exciting example of frontier research with ab 
initio computational nuclear theory. 

We therefore face the dual challenge of advancing the underlying theory at the 
same time as advancing the algorithms to keep pace with the growth in the size and 
complexity of leadership class computers. Recent history in these efforts, with the 
substantial support of the funding agencies, indicates we are experiencing a “Double 
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Moore’s Law” rate of improvement — i. e. Moore’s Law for hardware improvements 
and a simultaneous Moore’s Law improvement in the algorithms/software. We need 
continued support for multi-disciplinary collaborations and growth in leadership class 
facilities in order to achieve the full discovery potential of computational nuclear 
physics. 
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